[improvement](filecache) Adapt file cache queue consumption#63504
[improvement](filecache) Adapt file cache queue consumption#63504freemandealer wants to merge 7 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 31083 ms |
TPC-DS: Total hot run time: 170549 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve?
Issue Number: N/A
Related PR: N/A
Problem Summary: File cache hit ratio metrics are derived from global file cache read bytes, but warmup reads from manual warmup, periodic warmup, event-driven warmup, and rebalance-triggered warmup used to update the same counters as query reads. This polluted the query hit ratio. Mixed hit/miss reads could also be attributed to one source for the whole request. This change skips warmup updates to global file cache read metrics while preserving per-IOContext profile stats, records local/remote/peer bytes by actual returned bytes, and avoids updating metrics for failed reads. It also fixes direct-read partial continuation and no-warmup miss-only hit ratio refresh. After rebase, the warmup metrics UT exposed a separate ASAN issue because the test snapshot helper triggered all metric hooks, including a stale StorageEngine hook that captured a destroyed engine. The test now snapshots FileCacheMetrics directly, and StorageEngine deregisters its hook on destruction.
### Release note
File cache hit ratio metrics now exclude warmup reads.
### Check List (For Author)
- Test: Regression test / Unit Test
- Unit Test: DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh run-be-ut.sh --run --filter=BlockFileCacheTest.cached_remote_file_reader_warmup_does_not_update_global_metrics
- Unit Test: DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 sh run-be-ut.sh --run --filter='BlockFileCacheTest.cached_remote_file_reader*'
- Behavior changed: Yes. Warmup reads no longer contribute to global file cache read metrics used for query hit ratio; per-IOContext profile stats are preserved.
- Does this need documentation: No
Problem Summary: File cache background consumers used fixed intervals and batch sizes for LRU recorder log replay and _need_update_lru_blocks updates. When producers outpaced those consumers, backlog growth was hard to observe and could increase memory pressure. This change adds queue length metrics for LRU recorder log queues, exposes queue-size accessors, supports bounded LRU log replay, and makes both background consumers adapt their interval and batch size according to backlog watermarks. It also slices block LRU update work into smaller lock-hold batches and skips LRU log recording when tail-record retention is disabled.
None
- Test: Unit Test
- `CCACHE_DISABLE=1 DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 EXTRA_CXX_FLAGS='-Wno-error=deprecated-literal-operator' sh run-be-ut.sh --run --filter=BlockFileCacheTest.test_lru_log_replay_bound_and_disable_record`
- `build-support/check-format.sh`
- `git diff --check`
- Tried `build-support/run-clang-tidy.sh --base origin/master --build-dir be/ut_build_ASAN`; it was blocked by pre-existing/file-level diagnostics and system header lookup errors before producing a clean result.
- Behavior changed: Yes. File cache background queue consumers can increase consume frequency and batch size when backlog crosses configured watermarks.
- Does this need documentation: No
Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
a3410c3 to
97a715b
Compare
|
run buildall |
TPC-H: Total hot run time: 29197 ms |
TPC-DS: Total hot run time: 170457 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found one correctness issue in the bounded LRU log replay path.
Critical checkpoint conclusions:
- Goal/test: The PR improves file-cache queue consumption and metric accounting, and adds BE/regression coverage, but the bounded replay change can make persisted LRU dumps miss already-recorded events.
- Scope/minimality: The final version is mostly focused and removed the previously raised extra knobs/metrics.
- Concurrency/lifecycle: The issue is a lifecycle interaction between the replay thread and dump thread over the shadow queue and update counter; both use the recorder mutex for the shadow queue, but dump can still observe an intentionally partially replayed state.
- Config/compatibility: No remaining new configs in the final head; no storage format or FE/BE protocol compatibility concern found.
- Parallel paths: The concern applies to all four LRU queues because they all use the same bounded replay and dump path.
- Tests: Unit tests cover bounded replay itself, but do not cover dumping while replay backlog remains pending.
- Observability/performance: The new backlog metric/logging helps observe backlog, but does not prevent stale dumps.
- Transaction/persistence: This is persistence-related for file-cache LRU tail restore; a crash after a stale dump can restore an out-of-date LRU order.
User focus: No additional user-provided review focus was specified.
Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
|
run buildall |
|
/review |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
done fix
TPC-H: Total hot run time: 29742 ms |
TPC-DS: Total hot run time: 169839 ms |
| } | ||
|
|
||
| if (_no_warmup_num_hit_blocks->get_value() > 0) { | ||
| if (_no_warmup_num_read_blocks->get_value() > 0) { |
There was a problem hiding this comment.
a quick fix for div-by-zero bug
|
run buildall |
TPC-H: Total hot run time: 29264 ms |
TPC-DS: Total hot run time: 168476 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run p0 |
|
run cloud_p0 |
|
run nonConcurrent |
|
/review |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Review conclusion: no additional blocking issue found in the actual GitHub PR diff.
Critical checkpoint conclusions:
- Goal/test proof: The PR addresses adaptive file-cache queue consumption and queue visibility. BE unit tests cover bounded LRU log replay, dump-counter preservation, hard-cap drops, need-update LRU block behavior, and file-cache read metrics. A docker regression test is added for warmup read metrics.
- Scope/minimality: The GitHub PR file list is focused on BE file-cache code plus related tests. The earlier local-only unrelated FE/README diff was not part of GitHub's accepted PR diff.
- Concurrency: Reviewed the LRU recorder mutex usage, bounded replay, dump-drain-before-reset flow, block LRU update slicing under the cache lock, and the close condition-variable loops. No new distinct concurrency blocker found beyond the already-known/replied review threads.
- Lifecycle: Background threads are still joined in the cache destructor; the added storage-engine metric deregistration is appropriate for destructor lifecycle.
- Configuration: Extra adaptive knobs from earlier review context are no longer present; the remaining adaptive thresholds are internal constants. Existing mutable config values are still read in the background loops.
- Compatibility/storage/transactions: No storage format, FE/BE protocol, EditLog, or transaction compatibility issue identified.
- Parallel paths: File-cache read accounting now splits local/remote/peer bytes for direct, indirect, warmup, fallback, and mixed-hit paths; tests exercise these paths.
- Tests: Relevant BE unit coverage is present. I did not run tests in this review runner.
- Observability/performance: Queue length/drop metrics and high-backlog warnings are present; lock slicing reduces long cache-lock holds. No additional observability blocker found.
- User focus: No additional user-provided review focus was specified.
Existing review threads were checked first; I did not repeat the previously raised metrics, wait interval, or dump/replay race comments.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
skip buildall |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: File cache LRU recorder stores pending LRU operations before replaying them into the shadow queue for dump and restore. Hot cache blocks can repeatedly produce MOVETOBACK records faster than the replay thread consumes them, so the recorder backlog can grow with redundant pending moves. If earlier REMOVE records are dropped by the hard cap, later ADD records can also create duplicate shadow queue nodes for the same block.
This change stores pending recorder logs in a lock-protected list, coalesces duplicate pending MOVETOBACK records by moving the latest record to the list tail, and makes shadow queue ADD replay idempotent by replacing any existing entry for the same block. This keeps the recorder backlog smaller under hot-block access patterns and prevents duplicated shadow entries after dropped logs.
### Release note
None
### Check List (For Author)
- Test:
- Unit Test: `CCACHE_DISABLE=1 JAVA_HOME=/mnt/disk1/zhangzhengyu/build-dep/jdk-17.0.2/ DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 EXTRA_CXX_FLAGS='-Wno-error=deprecated-literal-operator' sh run-be-ut.sh --run --filter=CacheLRUDumperTest.*`
- Unit Test: `CCACHE_DISABLE=1 JAVA_HOME=/mnt/disk1/zhangzhengyu/build-dep/jdk-17.0.2/ DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 EXTRA_CXX_FLAGS='-Wno-error=deprecated-literal-operator' sh run-be-ut.sh --run --filter=NeedUpdateLRUBlocksTest.*:BlockFileCacheTest.test_lru_log_replay_bound_and_disable_record:BlockFileCacheTest.test_lru_log_record_replay_dump_restore`
- Format: `PATH=/mnt/disk1/zhangzhengyu/build-dep/ldb_toolchain.back/bin:$PATH build-support/check-format.sh`
- Check: `git diff --check`
- Static Analysis: `build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN` could not complete because local clang-tidy cannot analyze existing headers in this environment (`stddef.h` not found from the toolchain include path and existing `be/src/core/types.h` NOLINT diagnostics).
- Behavior changed: No
- Does this need documentation: No
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found one concurrency issue that should be fixed before merge.
Critical checkpoint conclusions:
- Goal/test: The PR aims to keep file-cache background queues bounded/adaptive and exclude warmup reads from global cache metrics; it includes BE unit tests and one cloud regression test for the main intended behavior.
- Scope: The implementation is mostly focused on file-cache metrics and LRU recorder/dumper behavior.
- Concurrency: There is a new inconsistent locking path for the shadow LRU queue between dump replay and optional queue-diff evaluation, detailed inline.
- Lifecycle/static initialization: No additional blocking lifecycle issue found.
- Configuration: Existing mutable file-cache configs are used; no new config knobs in the final PR scope.
- Compatibility/persistence: The previous dump/replay stale-counter concern is addressed by draining pending logs before dumping; no duplicate comment submitted.
- Parallel paths: Warmup metrics and direct/indirect read paths were reviewed; no additional issue found beyond the inline concurrency problem.
- Tests: Relevant BE unit tests are added; the new cloud regression is broad but does not cover the optional shadow-diff race.
- Observability/performance: Metrics were simplified relative to existing comments; no additional blocking issue found.
- User focus: No additional user-provided review focus was supplied.
| std::lock_guard<std::mutex> lru_log_lock(_recorder->_mutex_lru_log); | ||
| // Drain logs counted for this dump before updating the counter, otherwise restore may | ||
| // see a stale LRU tail after a crash. | ||
| _recorder->replay_queue_event_locked(type, 0, lru_log_lock); |
There was a problem hiding this comment.
This now mutates the shadow LRU queue from the dump thread while holding only _mutex_lru_log, but LRUQueueRecorder::evaluate_queue_diff() still reads the same shadow queue without that mutex (it is called from run_background_lru_log_replay() under only the cache mutex when enable_evaluate_shadow_queue_diff=true). Before this change the dump path only collected the shadow queue under _mutex_lru_log; after this replay call, dump can concurrently erase/splice/add entries while evaluation copies/traverses target.queue and target.map, causing a data race and possible iterator/container corruption. Please make the shadow queue locking consistent, e.g. take _mutex_lru_log around evaluate_queue_diff()'s access or move the diff calculation onto a snapshot collected under that mutex.
TPC-H: Total hot run time: 29245 ms |
TPC-DS: Total hot run time: 169520 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Problem Summary: File cache background consumers used fixed intervals and batch sizes for LRU recorder log replay and _need_update_lru_blocks updates. When producers outpaced those consumers, backlog growth was hard to observe and could increase memory pressure. This change adds queue length metrics for LRU recorder log queues, exposes queue-size accessors, supports bounded LRU log replay, and makes both background consumers adapt their interval and batch size according to backlog watermarks. It also slices block LRU update work into smaller lock-hold batches and skips LRU log recording when tail-record retention is disabled.
None
CCACHE_DISABLE=1 DORIS_TOOLCHAIN=clang DISABLE_BE_JAVA_EXTENSIONS=ON ENABLE_INJECTION_POINT=ON ENABLE_CACHE_LOCK_DEBUG=0 ENABLE_PCH=0 EXTRA_CXX_FLAGS='-Wno-error=deprecated-literal-operator' sh run-be-ut.sh --run --filter=BlockFileCacheTest.test_lru_log_replay_bound_and_disable_recordbuild-support/check-format.shgit diff --checkbuild-support/run-clang-tidy.sh --base origin/master --build-dir be/ut_build_ASAN; it was blocked by pre-existing/file-level diagnostics and system header lookup errors before producing a clean result.What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)